Validating the Unmind Index as a measure of mental health and wellbeing among adults in USA, Australia, and New Zealand

Background The Unmind Index is a 26-item, 7-subscale measure of mental health and wellbeing designed for use on the Unmind digital workplace mental health platform. The Unmind Index was developed and validated in the UK but is used internationally. This paper reports further psychometric validation of this measure for use in USA, Australia, and New Zealand (ANZ). Methods Participants in four countries completed the Unmind Index and a battery of existing measures. In Study 1 (N = 770), we validated the Unmind Index separately in USA and in ANZ. In Study 2 (N = 600), we used multiple group confirmatory factor analysis to test the measurement invariance of the Unmind Index across the UK, USA, and ANZ. Results Study 1 establishes the factor structure, reliability, convergent and discriminant validity, and measurement invariance by age and gender of the Unmind Index separately for USA and for ANZ. Study 2 further demonstrates measurement invariance across locations, and establishes benchmark scores by location, age, and gender. Conclusions We conclude that the Unmind Index is valid and reliable as a measure of mental health and wellbeing in these locations.


Introduction
Unmind is a workplace digital mental health platform that utilises tools to help users track, maintain, and improve their mental health and wellbeing [1].One of the central features of the platform is the Unmind Index [2], a measure of mental health and wellbeing (MHWB) with seven subscales-Calmness, Connection, Coping, Happiness, Health, Fulfilment, and Sleep-asked to rate how often each item applies to them on a 6-point Likert scale from "No days" (0) to "Every day" (5).Items were presented in random order.
The existing measures of mental health and personality used in this study, and the Unmind Index subscales they were expected to correlate with, are summarised in Table 1.We expected the PHQ-8 [7] to correlate negatively with the Happiness subscale, GAD-7 [8] to correlate negatively with Calmness, the HADS [9] anxiety subscale to correlate negatively with Calmness, and HADS depression negatively with Happiness, the Perceived Stress Scale [10] to correlate negatively with Coping, the PROMIS sleep disturbance short form [11] to correlate negatively with Sleep, the PROMIS-10 [12] physical health subscale to correlate positively with Health, and Brief Inventory of Thriving [13] to correlate positively with Fulfilment.The Warwick-Edinburgh Mental Wellbeing Scale [14] was expected to correlate positively with the Unmind Index overall score.To establish the discriminant validity of the Unmind Index, we also included the Ten-Item Personality Inventory [15], a brief scale that measures individual differences in the "big five" personality traits (extraversion, agreeableness, conscientiousness, emotional stability, and openness to experiences).
Analyses.All statistical analyses were performed in R (v4.0.3) [16].Unless otherwise noted, all analyses were performed separately for USA participants and ANZ participants.Direct comparisons between locations are reported in Study 2.
Confirmatory factor analysis.The factor structure of the Unmind Index was tested through confirmatory factor analysis (CFA), using the lavaan package for R [17], with maximum-likelihood estimation.As each question has six possible response options, and so cannot meet assumptions of normality, we used robust Huber-White standard errors and fit statistics.Our previous work [2] showed that a second-order factor structure (Fig 1) provided a good fit for Unmind Index data collected from UK participants, and this structure is used to calculate Unmind Index scores on the Unmind platform.In this, every item loads on to one of the seven Unmind Index subscales, Happiness, Sleep, Coping, Calmness, Health, Connection, and Fulfilment, and each subscale loads onto the general Mental Health and Wellbeing factor.A bifactor model was also considered in our previous work [2], but is not discussed further here.To explore relationships between subscales, we also fit a correlated-factors model, where the subscales do not load onto a general factor, but correlations between subscales are estimated directly.All latent factors were standardized to have a variance of 1.
Model fit was evaluated using several indices: the Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA), and the Standardized Root Mean Residual (SRMR).A model fit > .90 was considered acceptable for both CFI and TLI, and > .95considered good [18].For RMSEA and SRMR, a value between .06-.08 was considered an acceptable fit, while a value < .06 was considered a good fit [19].To identify potential causes of poor model fit, we inspected correlation residuals between each pair of items-mismatches between the correlations implied by the fitted model and those observed in the data.Correlation residuals greater than 0.1 in absolute value were identified as notable departures [20].
Test-retest reliability.Test-retest reliability was estimated for participants who completed the one-week follow-up questionnaire, using two-way consistency intra-class correlation coefficients, ICC(C, 1).
Internal consistency.To determine internal consistency of the Unmind Index, we computed Cronbach's α.As the tau-equivalence assumption of α are rarely met in practice we also calculated coefficient omega [21] as an indicator of internal consistency.
Convergent and discriminant validity.The existing measures of mental health and personality used in this study, and the Unmind Index subscales they were expected to correlate with, are summarised in Table 1.Pearson correlations were computed between each existing measure and Unmind Index scores and adjusted for reliability (dis-attenuated) using the Cronbach's α estimates for each measure.In cases where the correlation between measures was predicted to be negative, for instance between Unmind Index Happiness scores and PHQ-8, the sign of the correlation is reversed to be positive for clarity.Unmind Index subscale scores were calculated by averaging responses within each subscale after reverse-scoring, and total Unmind Index scores as the average of the seven subscale scores.
Given the strong associations typically found between various mental health measures [22], we assessed convergent validity by checking that the pattern of correlations of Unmind Index subscale scores with the relevant existing measures (e.g.Happiness and PHQ-8) are a) strong, and b) stronger than the correlation with less relevant existing measures (e.g.Happiness and GAD-7).Discriminant validity was similarly assessed by checking that correlations between Unmind Index subscales and TIPI personality subscales are weak, and weaker than correlations between the Unmind Index and mental health measures.
Measurement invariance.We used multiple-group CFA to test the measurement invariance of the Unmind Index across age and gender groups.This allows us to test that factor structures are consistent across groups, that loadings are consistent, and that scores are not biased by differences in response to individual items.These conditions must be met for scores to be validly compared across groups.Median participant age was 44 years in USA, and 39 years in ANZ, and participants were classed as either older or younger than the median in each location.9 participants who responded "Non-binary", "Other" or "Prefer not to say" when asked about their gender identity were excluded from the gender invariance analysis.Measurement invariance between locations is tested in Study 2.
Measurement invariance was tested in accordance with the steps outlined by Millsap [3].We began by fitting a configural invariance model, where both groups have the same factor structure, but all parameter values are allowed to differ between groups.Achieving a good model fit here indicates that both groups have the same overall factor structure.We then compared this model to a weak/metric invariance model, where first-and second-level factor loadings are constrained to be equal across groups.If this constraint does not appreciably reduce model fit, we can conclude that factor weights are consistent across groups.Lastly, we fitted a strong/scalar invariance model, where item intercepts are also constrained to be equal, but factor means are allowed to differ between groups.If this does not show a poorer fit than the weak invariance model, we can conclude that item intercepts are equivalent across groups.In other words, any differences in factor scores are not driven by group differences on specific items.It is only appropriate to compare factor scores across groups if this third condition is met.
To compare model fits, we calculate CFI and Bayesian Information Criteria (BIC) for each model.We consider a constrained model to have worse fit than the unconstrained alternative if CFI is more than 0.01 points lower for the constrained model [23], or if BIC is higher.An increase or a reduction of less than 0.01 points in CFI, along with a decrease in BIC, constitutes evidence for invariance.
The analyses described above test whether measurement invariance holds for the 26-item Unmind Index as a whole-that is, whether the data is better accounted for by a model where all factor loadings and item intercepts are constrained to be equal across groups, versus a model where all loadings and intercepts are allowed to vary.Similar analyses of individual subscales are reported in S1 File.

Results
Model fit.Table 2 shows CFA model fit indices for the second-order factor model, fit to data from USA and ANZ, indicating acceptable fit to the data in both locations.
Standardised item-to-factor loadings and residual variance estimates for each item are shown in Table 3. Means and standard deviations for each subscale and for the overall score are shown int Table 4. Correlations between subscales are shown in Table 5.
Correlation residuals greater than 0.   6.In both the USA and ANZ, internal consistency and test retest reliability (ICC(C, 1)) were good for all subscales and for the total score.There were no clear differences in consistency or reliability between locations.
Convergent and discriminant validity.USA.Correlations between Unmind Index subscales and existing measures for USA participants, corrected for attenuation, are shown in Fig 3 .Correlations without disattenuation are reported in S1 File and show consistent results unless otherwise noted.
In general, Unmind Index subscales were most strongly associated with the expected measures of mental health and wellbeing, slightly less associated with other measures of mental health and wellbeing, and only weakly associated with personality traits.However, there were several exceptions.The association between the Fulfilment subscale and the WEMWBS, a general measure of wellbeing, was as strong as that between Fulfilment and the Brief Inventory of Thriving, the measure expected to correlate most strongly with this subscale.Although the Happiness subscale was as expected most strongly associated with PHQ-8 scores, the association with the HADS depression subscale was weaker than expected, and was of the same magnitude as the association with the HADS anxiety subscale, GAD-7, and PSS.This suggests that the Happiness subscale measures a construct related to depression, anxiety, and stress, rather than depression alone.The Health subscale was most strongly associated with PRO-MIS-10 combined (physical and mental) health scores than PROMIS physical health scores.Finally, TIPI emotional stability scores and agreeableness scores were both moderately or strongly associated with scores on the Unmind Index Calmness, Connection, Coping, Fulfilment, and Happiness subscales.ANZ.Equivalent correlations for ANZ participants are shown in Fig 4, with correlations without disattenuation reported in S1 File.The overall pattern of correlations was as expected, but there were once again some exceptions.As was the case for USA participants, the  Fulfilment subscale was more strongly associated with WEMWBS scores, and Health subscale scores more strongly associated with PROMIS-10 overall health scores, than expected.TIPI emotional stability and agreeableness scores were moderately to strongly associated with several Unmind Index subscales.Unmind Index Happiness scores were most strongly associated with PSS scores, followed by PHQ-8, the HADS depression subscale, and PROMIS-10 mental health subscale.
Invariance.Measurement invariance results by age and gender, for USA and ANZ participants, are reported in Table 7.For all comparisons, BIC scores were lowest for the strong invariance model, and CFI values were superior, or inferior by less than -.01, for the strong invariance model.We therefore conclude that the Unmind Index shows measurement invariance by age and by gender, both in the USA and ANZ.CFI values were generally below the "acceptable" cut-off of 0.9.However, as discussed above, this largely reflects the inability of the second-order factor model to account for correlations between the Calmness and Happiness subscales, and factor scores estimated from this slightly mis-specified model correlate almost perfectly with scores estimated from a model that directly models factor correlations.Finally, we found that Unmind Index scores were higher for older participants, and for male participants.These patterns are consistent with the results of Study 2, presented in detail below.
Measurement invariance results for each of the seven Unmind Index subscales by age and gender are reported in S1 File.By gender, all scales showed evidence of strong measurement invariance in both USA and ANZ.By age group, there was some evidence of violation of measurement invariance for the Coping subscale for USA participants, and the Calmness, Fulfilment, and Happiness subscales for ANZ participants.Of these violations, only one was replicated in Study 2: weak but not strong invariance for the Fulfilment subscale by age groups (reported in S1 File).

Discussion
These results indicate that the second-order factor structure of the Unmind Index provides an acceptable fit to data from USA and ANZ.However, model fit was not ideal, and inspection of the correlation residuals indicated that the Happiness and Calmness subscales are more strongly correlated than would be expected given the second-order model.This is consistent with the results reported in our previous UK validation study [2].These subscales capture symptoms associated with depression and anxiety, respectively.Given the known associations between depression and anxiety [24], it is unsurprising that the subscales should be more correlated with each other than they are with other subscales such as Sleep or Fulfilment.Although these subscales are strongly correlated, we believe that Unmind's users are best-served by maintaining two distinct subscales in second-order factor structure, since the subjective experiences associated with depression symptoms and anxiety symptoms are quite different, and the Unmind platform provides distinct resources for addressing each set of symptoms.This is in line with diagnostic theory and clinical practice [24].Our analyses also showed that our decision to use a second-order structure rather than the better-fitting correlated-factors structure does not distort the scores obtained on each subscale, as scores from the two structures are almost perfectly correlated.
Also in line with our previous UK validation study [2], the current results also indicate that the Unmind total score shows excellent reliability, and subscale scores show good reliability.The current convergent validity results are also broadly consistent with our predictions, and with the results obtained in the UK sample, with a few exceptions described above.These exceptions may reflect international heterogeneity for some of the constructs in question, although it is not clear if these differences in the behaviour of the Unmind Index across locations, or differences in the construct validity of other measures, such as the Perceived Stress Scale.Unfortunately, at present few measures are separately validated for use across different English-speaking locations.We found that several subscales were estimated to correlate more strongly than was expected with personality traits assessed by TIPI, in particular the emotional stability and agreeableness traits.However, these correlation estimates are to some degree inflated by the low reliability of the TIPI measures (α = .75and .34respectively), which are taken into account when estimating the disattenuated correlation coefficients.It should also be noted that emotional stability has previously been shown to correlate strongly with existing measures of mental health problems [25].Finally, the Unmind Index as a whole displayed strong measurement invariance by age and by gender in both USA and ANZ.Invariance results for the subscales are discussed below.

Study 2: Invariance by location
In Study 1, we established that the Unmind Index is a valid and reliable measure of mental health and wellbeing in USA and in ANZ.We previously established the same conclusions in the UK [2].Our next goal was to establish if Unmind Index scores can be validly compared across these locations-that is, if it shows measurement invariance across locations-and if so, to compare scores obtained in each location and establish appropriate benchmarks for standardised scoring.Given the different times at which the data described in Study 1 and in our original UK validation [2] were collected, it would not be appropriate to directly compare results across these datasets.For this reason, we decided to obtain a new dataset of participants in the UK, USA, and ANZ, collected concurrently.

Methods
Participants.600 participants were recruited using the Prolific platform, and the sample was stratified by location (UK, USA, and ANZ), age (18-42 years, 43 and over) and sex (male and female) into twelve subgroups of 50 participants each.Testing took place on November 8th, 2021.Detailed characteristics of this sample are reported in S1 File.
Measures.Participants were presented with the Unmind Index, followed by the demographic questions from Study 1.
Analyses.Measurement invariance.To establish measurement invariance across locations, we used multiple-group confirmatory factor analysis, fitting the second-order factor model with parameters allowed to vary or constrained to be equal between locations, as described above.We report results from omnibus tests comparing models in which parameters are constrained to be equal in all three locations to models in which parameters are allowed to vary.All measurement invariance analyses were also conducted for each of the seven Unmind Index subscales individually, using single-factor CFA models.
Group comparisons.To explore differences in Unmind Index scores by location, age, and gender, we calculated the mean, standard deviation, and standard errors of scores within each subgroup.For this purpose, we split participants into four age groups: 18-25, 26-40, 41-50, and 51-84.Due to the small numbers involved, participants who reported genders other than "male" or "female" are excluded from these analyses.
Complete tables of these benchmark statistics for each Unmind Index subscale are reported in S1 File.To visualise these patterns, we treat age as a continuous variable and plot loess-smoothed estimates using the ggplot2 package for R [26].Finally, to summarise these patterns, we fit a linear model with location, gender, and age as predictors.Age was centred on the mean of 40 years and divided by 10, "female" was coded as the baseline for gender, and "UK" as the baseline for location.As a result, the intercept term is an estimate for UK female participants aged 40, and the remaining coefficients indicate how scores differ from this reference value, with the age coefficient reflecting the change in scores for a 10-year increase in age.For clarity, we show plots and report regression results for only total Unmind Index scores below.Full results are reported in S1 File.As these analyses are exploratory, we do not report p-values for hypothesis tests.

Results
Invariance.Measurement invariance results are summarised in Table 8.For the full Unmind Index CFI for the configural invariance model, where all parameters are allowed to vary across locations, was .903.This value is acceptable, and consistent with the results of Study 1 (CFI = .910for USA, .917for ANZ).Constraining factor loadings to be equal across locations in the weak invariance model increased CFI by .001 to .904, and additionally constraining item intercepts to be equal in the strong invariance model reduced CFI by only .003 to .902.Consistent with this, BIC was 68 points lower for the strong invariance model than the weak model, and 130 points lower for the weak model than the configural model.Therefore, we conclude that the Unmind Index displays strong measurement invariance across locations, and so scores can be compared across locations.
Similar results were found for each subscale (Table 8), with all CFI values � .972for configural invariance models, no changes in CFI � -.008 observed when constraining parameters by location, and the strong invariance model obtaining the lowest BIC for all subscales.Therefore, we conclude that Unmind Index subscale scores also display strong measurement invariance, and can be compared across locations.
Benchmarks and group comparisons.Benchmarks for overall Unmind Index scores by location, gender, and age are shown in Fig 5 and Table 9. Full tables are reported in S1 File.Scores were consistently higher for male participants, and for older participants, but did not differ systematically between locations.Linear model coefficients are reported in Table 10.

Discussion
These results provide evidence that the Unmind Index total score and the individual subscales display strong measurement invariance across participants from the UK, USA, and ANZ.

General discussion
Taken together, our results establish that the Unmind Index is an appropriate measure of MHWB in the UK, USA, and Australia/New Zealand.In Study 1, we demonstrated that the second-order factor model of the Unmind Index adequately captures the covariance structure of the 26 items that make up the Unmind Index in USA and ANZ samples.Furthermore, this model yields factor scores almost perfectly correlated with a more complicated model which excellently captures the structure.We also demonstrated good reliability (internal consistent and test-retest reliability) for all seven subscales, and excellent reliability for the total score, in both locations.
Correlations with existing measures of mental health and wellbeing were strong and broadly as expected in both locations.Some correlations with related existing measures were stronger than expected, e.g. between the Fulfilment subscale and the WEMWBS, a measure of general mental wellbeing.This suggests that the Unmind Index subscales are not always highly specific measures of MHWB.However, these cross-correlations are commonly found for MHWB measures, and likely reflect the transdiagnostic nature of many psychological attributes [22].In future work, we hope to further explore this validation from a transdiagnostic perspective.
In Study 1 we also established that the overall mental health and wellbeing score showed evidence of strong measurement invariance by gender and by age group in both locations In Study 2, we found evidence of measurement invariance by location (UK, USA, or ANZ) for overall scores and for all subscales.All subscales displayed strong measurement invariance by  Measurement invariance is a necessary condition for comparing scores across groups; if measurement invariance does not hold, scores cannot be validly compared across groups.It should be noted, however, that it not a sufficient condition for comparison, and there may be other sources of bias in comparing scores from men and women, older or younger users, or users in different locations, that are not captured by these analyses.
We would also note that this study addresses the validity and invariance of the Unmind Index in several Western, English-speaking, and industrialised countries: the UK, USA, Australia, and New Zealand.It may be likely that these results would generalise to similar countries, such as Canada or Ireland.However, further work is required to establish the validity and psychometric characteristics of the Unmind Index in non-Western and non-Englishspeaking locations.This work is ongoing.
We noted above that it is rare for MHWB measures developed in one English-speaking country to be properly validated for use in other such countries.Our results show that the Unmind Index, developed in the UK, is indeed valid for use in the United States, Australia, and New Zealand.However, it is not yet clear to what extent these results would generalise to other measures of MWHB.We would therefore encourage researchers and practitioners to consider validating the measures they use whenever possible, even if said measures have been validated for use in other English-speaking countries.
This work has a few limitations that should be noted.Since recruitment was carried out using the Prolific platform, the participants sampled were of course limited to users of that platform, and biased towards more active users.We cannot rule unmeasured differences between this sample and the general population.However, this limitation is common to all but the most sophisticated survey studies.Another related limitation is the smaller-than-planned sample size of older participants in Australia/New Zealand.We also note that data collection took place throughout 2021, during the COVID-19 pandemic.Interestingly, previous research [28] has shown that at least one measure of state affect and one measure of trait affect show strict measurement invariance when comparing data from before and during the acute phase of the pandemic.In general, though, it is not known how the psychometric properties of MHWB measures are affected by major events like the COVID pandemic.
To conclude, our results indicate that the Unmind Index is fit for purpose as a multifactor measure of MWHB for users in the UK, United States, Australia and New Zealand.They also indicate no issues in comparing Unmind Index scores across age or gender groups, or across

Fig 1 .
Fig 1.The second-order factor structure used for the Unmind Index.https://doi.org/10.1371/journal.pone.0287215.g001 1 in absolute value for the second-order model fit to USA data are shown in Fig 2A.Corresponding residuals for ANZ data are shown in S1 File.These show that the second-order model could not fully explain the positive correlations between items in the Happiness and Calmness subscales.Model fit was substantially improved in the correlated factors model, which explicitly models correlations between subscales: χ 2 (278) = 563.4,SRMR = .048,RMSEA = .056(90% CI [.049; .063]),CFI = .955,TLI = .948.Correlation residuals for this model (Fig 2B) show that the associations between Happiness and Calmness items are captured by the direct correlation between these two factors, but substantial unexplained correlations remain between the third Sleep item ("Had trouble falling or

Fig 2 .
Fig 2. Correlation residuals greater than 0.1 in absolute value for the second-order (A) and correlated-factors (B) CFA models, for the USA sample.Large residuals reflect ways in which a model fails to fully capture the correlation between pairs of items.https://doi.org/10.1371/journal.pone.0287215.g002

Fig 3 .Fig 4 .
Fig 3. Dis-attenuated absolute correlation coefficients between Unmind Index scores and existing measures for the USA sample.Values in red show correlations with mental health and wellbeing measures predicted to correlate most strongly with the Unmind Index subscale in question.Values in blue show personality measures, which were expected to correlate most weakly with all scales.Error bars show standard error.https://doi.org/10.1371/journal.pone.0287215.g003

Table 1 . Established measures used to test concurrent and discriminant validity of the Unmind Index.
Reliability estimates are averages across USA and ANZ samples.

Table 8 . Tests of measurement invariance for the Unmind Index and its individual subscales across locations (UK, USA, and ANZ).
Results indicate that strong measurement invariance holds for the full Unmind Index, and individually for each subscale. https://doi.org/10.1371/journal.pone.0287215.t008

Table 9 . Benchmark means (± standard deviations) of total Unmind Index scores by location, gender, and age group, from Study 2.
[27]oth studies, but the Fulfilment subscale showed only weak invariance by age group in both studies.Inspection of item means by group (not reported) revealed that this lack of invariance was due to older participants scoring 0.8 points higher on all Fulfilment items except for "[...] felt that I am growing positively as a person", on which older and younger participants did not differ.This is perhaps unsurprising, and consistent with recent work showing that the Subjective Happiness Scale, a similar measure, also shows only weak invariance by age[27]. https://doi.org/10.1371/journal.pone.0287215.t009gender